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Abstract 

This paper presents Riposte, a new system for anonymous 
broadcast messaging. Riposte is the first such system, 
to our knowledge, that simultaneously protects against 
traffic-analysis attacks, prevents anonymous denial-of- 
service by malicious clients, and scales to million-user 
anonymity sets. To achieve these properties, Riposte 
makes novel use of techniques used in systems for private 
information retrieval and secure multi-party computation. 
For latency-tolerant workloads with many more readers 
than writers (e.g. Twitter, Wikileaks), we demonstrate that 
a three-server Riposte cluster can build an anonymity set 
of 2,895,216 users in 32 hours. 


1 Introduction 

In a world of ubiquitous network surveillance [7, 38, 
39, 43, 66], prospective whistleblowers face a daunting 
task. Consider, for example, a government employee who 
wants to anonymously leak evidence of waste, fraud, or 
incompetence to the public. The whistleblower could 
email an investigative reporter directly, but post hoc anal¬ 
ysis of email server logs could easily reveal the tip¬ 
ster’s identity. The whistleblower could contact a re¬ 
porter via Tor [31] or another low-latency anonymizing 
proxy [35, 57, 63, 75], but this would leave the leaker 
vulnerable to traffic-analysis attacks [4, 64, 65], The 
whistleblower could instead use an anonymous messaging 
system that protects against traffic analysis attacks [17, 
42, 81], but these systems typically only support rela¬ 
tively small anonymity sets (tens of thousands of users, 
at most). Protecting whistleblowers in the digital age re¬ 
quires anonymous messaging systems that provide strong 
security guarantees, but that also scale to very large net¬ 
work sizes. 

This is the extended version of a paper by the same name that appeared 
at the IEEE Symposium on Security and Privacy in May 2015. 


In this paper, we present a new system that attempts to 
make traffic-analysis-resistant anonymous broadcast mes¬ 
saging practical at Internet scale. Our system, called Ri¬ 
poste, allows a large number of clients to anonymously 
post messages to a shared “bulletin board,” maintained 
by a small set of minimally trusted servers. (As few as 
three non-colluding servers are sufficient). Whistleblow¬ 
ers could use Riposte as a platform for anonymously pub¬ 
lishing Tweet- or email-length messages and could com¬ 
bine it with standard public-key encryption to build point- 
to-point private messaging channels. 

While there is an extensive literature on anonymity sys¬ 
tems [26,32], Riposte offers a combination of security and 
scalability properties unachievable with current designs. 
To the best of our knowledge. Riposte is the only anony¬ 
mous messaging system that simultaneously: 

1. protects against traffic analysis attacks, 

2. prevents malicious clients from anonymously exe¬ 
cuting denial-of-service attacks, and 

3. scales to anonymity set sizes of millions of users, for 
certain latency-tolerant applications. 

We achieve these three properties in Riposte by adapt¬ 
ing three different techniques from the cryptography and 
privacy literature. First, we defeat traffic-analysis attacks 
and protect against malicious servers by using a protocol, 
inspired by client/server DC-nets [17, 81], in which ev¬ 
ery participating client sends a fixed-length secret-shared 
message to the system’s servers in every time epoch. Sec¬ 
ond, we achieve efficient disruption resistance by using a 
secure multi-party protocol to quickly detect and exclude 
malformed client requests [33,45,82]. Third, we achieve 
scalability by leveraging a specific technique developed in 
the context of private information retrieval (PIR) to min¬ 
imize the number of bits each client must upload to each 
server in every time epoch. The tool we use is called a 
distributed point function [20,41]. The novel synthesis 
of these techniques leads to a system that is efficient (in 
terms of bandwidth and computation) and practical, even 
for large anonymity sets. 
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Our particular use of private information retrieval (PIR) 
protocols is unusual: PIR systems [21] allow a client to 
efficiently read a row from a database, maintained collec¬ 
tively at a set of servers, without revealing to the servers 
which row it is reading. Riposte achieves scalable anony¬ 
mous messaging by running a private information re¬ 
trieval protocol in reverse : with reverse PIR, a Riposte 
client can efficiently write into a database maintained at 
the set of servers without revealing to the servers which 
row it has written [71]. 

As we discuss later on, a large Riposte deployment 
could form the basis for an anonymous Twitter service. 
Users would “tweet” by using Riposte to anonymously 
write into a database containing all clients’ tweets for a 
particular time period. In addition, by having read-only 
users submit “empty” writes to the system, the effective 
anonymity set can be much larger than the number of writ¬ 
ers, with little impact on system performance. 

Messaging in Riposte proceeds in regular time epochs 
(e.g., each time epoch could be one hour long). To post 
a message, the client generates a write request, crypto¬ 
graphically splits it into many shares, and sends one share 
to each of the Riposte servers. A coalition of servers 
smaller than a certain threshold cannot learn anything 
about the client’s message or write location given its sub¬ 
set of the shares. 

The Riposte servers collect write requests until the end 
of the time epoch, at which time they publish the aggrega¬ 
tion of the write requests they received during the epoch. 
From this information, anyone can recover the set of posts 
uploaded during the epoch, but the system reveals no in¬ 
formation about who posted which message. The identity 
of the entire set of clients who posted during the interval is 
known, but no one can link a client to a post. (Thus, each 
time epoch must be long enough to ensure that a large 
number of honest clients are able to participate in each 
epoch.) 

In this paper, we describe two Riposte variants, which 
offer slightly different security properties. The first vari¬ 
ant scales to very large network sizes (millions of clients) 
but requires three servers such that no two of these servers 
collude. The second variant is more computationally ex¬ 
pensive, but provides security even when all but one of the 
s > 1 servers are malicious. Both variants maintain their 
security properties when network links are actively adver¬ 
sarial, when all but two of the clients are actively mali¬ 
cious, and when the servers are actively malicious (subject 
to the non-collusion requirement above). 

The three-server variant uses a computationally inex¬ 
pensive multi-party protocol to detect and exclude mal¬ 
formed client requests. (Figure 1 depicts this protocol at 


a high-level.) The ,v-server variant uses client-produced 
zero-knowledge proofs to guarantee the well-formedness 
of client requests. 

Unlike Tor [31] and other low-latency anonymity sys¬ 
tems [42,52,57,75], Riposte protects against active traf¬ 
fic analysis attacks by a global network adversary. Prior 
systems have offered traffic-analysis-resistance only at the 
cost of scalability: 

• Mix-net-based systems [18] require large zero- 
knowledge proofs of correctness to provide privacy 
in the face of active attacks by malicious servers [2, 
5,36,49,69], 

• DC-nets-based systems require clients to transfer 
data linear in the size of the anonymity set [17,81] 
and rely on expensive zero-knowledge proofs to pro¬ 
tect against malicious clients [24,48]. 

We discuss these systems and other prior work in Sec¬ 
tion 7. 

Experiments. To demonstrate the practicality of Ri¬ 
poste for anonymous broadcast messaging (i.e., anony¬ 
mous whistleblowing or microblogging), we implemented 
and evaluated the complete three-server variant of the sys¬ 
tem. When the servers maintain a database table large 
enough to fit 65,536 160-byte Tweets, the system can pro¬ 
cess 32.8 client write requests per second. In Section 6.3, 
we discuss how to use a table of this size as the basis 
for very large anonymity sets in read-heavy applications. 
When using a larger 377 MB database table (over 2.3 mil¬ 
lion 160-byte Tweets), a Riposte cluster can process 1.4 
client write requests per second. 

Writing into a 377 MB table requires each client to 
upload less than 1 MB of data to the servers. In con¬ 
trast, a two-server DC-net-based system would require 
each client to upload more than 750 MB of data. More 
generally, to process a Riposte client request for a table of 
size L, clients and servers perform only 0[\JT) bytes of 
data transfer. 

The servers’ AES-NI encryption throughput limits the 
rate at which Riposte can process client requests at large 
table sizes. Thus, the system’s capacity to handle client 
write request scales with the number of available CPU 
cores. A large Riposte deployment could shard the 
database table across k machines to achieve a near-k-fold 
speedup. 

We tested the system with anonymity set sizes of up 
to 2,895,216 clients, with a read-heavy latency-tolerant 
microblogging workload. To our knowledge, this is the 
largest anonymity set ever constructed in a system defend¬ 
ing against traffic analysis attacks. Prior DC-net-based 
systems scaled to 5,120 clients [81] and prior verifiable- 
shuffle-based systems scaled to 100,000 clients [5]. In 
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Server A Server B 



Client 

(a) A client submits one share 
of its write request to each of 
the two database servers. If the 
database has length L, each share 
has length 0(y/L). 



(b) The database servers gen¬ 
erate blinded “audit request” 
messages derived from their 
shares of the write request. 



(c) The audit server 
uses the audit request 
messages to validate 
the client’s request 
and returns an “OK” 
or “Invalid” bit to the 
database servers. 



(d) The servers apply the 
write request to their local 
database state. The XOR of 
the servers’ states contains 
the clients message at the 
given row. 


Figure 1: The process of handling a single client write request. The servers run this process once per client in each 
time epoch. 


contrast. Riposte scales to millions of clients for certain 
applications. 

Contributions. This paper contributes: 

• two new bandwidth-efficient and traffic-analysis- 
resistant anonymous messaging protocols, obtained 
by running private information retrieval protocols “in 
reverse” (Sections 3 and 4), 

• a fast method for excluding malformed client re¬ 
quests (Section 5), 

• a method to recover from transmission collisions in 
DC-net-style anonymity systems, 

• experimental evaluation of these protocols with 
anonymity set sizes of up to 2,895,216 users (Sec¬ 
tion 6). 

In Section 2, we introduce our goals, threat model, and 
security definitions. Section 3 presents the high-level sys¬ 
tem architecture. Section 4 and Section 5 detail our tech¬ 
niques for achieving bandwidth efficiency and disruption 
resistance in Riposte. We evaluate the performance of the 
system in Section 6, survey related work in Section 7, and 
conclude in Section 8. 

2 Goals and Problem Statement 

In this section, we summarize the high-level goals of the 
Riposte system and present our threat model and security 
definitions. 

2.1 System Goals 

Riposte implements an anonymous bulletin board using 
a primitive we call a write-private database scheme. Ri¬ 
poste enables clients to write into a shared database, col¬ 
lectively maintained at a small set of servers, without re¬ 
vealing to the servers the location or contents of the write. 


Conceptually, the database table is just a long fixed-length 
bitstring divided into fixed-length rows. 

To write into the database, a client generates a write re¬ 
quest. The write request encodes the message to be writ¬ 
ten and the row index at which the client wants to write. 
(A single client write request modifies a single database 
row at a time.) Using cryptographic techniques, the client 
splits its write request into a number of shares and the 
client sends one share to each of the servers. By construc¬ 
tion of the shares, no coalition of servers smaller than a 
particular pre-specified threshold can learn the contents of 
a single client’s write request. While the cluster of servers 
must remain online for the duration of a protocol run, a 
client need only stay online for long enough to upload its 
write request to the servers. As soon as the servers receive 
a write request, they can apply it to to their local state. 

The Riposte cluster divides time into a series of epochs. 
During each time epoch, servers collect many write re¬ 
quests from clients. When the servers agree that the epoch 
has ended, they combine their shares of the database to re¬ 
veal the clients’ plaintext messages. A particular client’s 
anonymity set consists of all of the honest clients who 
submitted write requests to the servers during the time 
epoch. Thus, if 50,000 distinct honest clients submitted 
write requests during a particular time epoch, each honest 
client is perfectly anonymous amongst this set of 50,000 
clients. 

The epoch could be measured in time (e.g., 4 hours), in 
a number of write requests (e.g., accumulate 10,000 write 
requests before ending the epoch), or by some more com¬ 
plicated condition (e.g., wait for a write request signed 
from each of these 150 users identified by a pre-defined 
list of public keys). The definition of what constitutes an 
epoch is crucial for security, since a client’s anonymity 
set is only as large as the number of honest clients who 
submit write requests in the same epoch [77]. 
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When using Riposte as a platform for anonymous mi¬ 
croblogging, the rows would be long enough to fit a 
Tweet (140 bytes) and the number of rows would be 
some multiple of the number of anticipated users. To 
anonymously Tweet, a client would use the write-private 
database scheme to write its message into a random row 
of the database. After many clients have written to 
the database, the servers can reveal the clients’ plain¬ 
text Tweets. The write-privacy of the database scheme 
prevents eavesdroppers, malicious clients, and coalitions 
of malicious servers (smaller than a particular threshold) 
from learning which client posted which message. 

2.2 Threat Model 

Clients in our system are completely untrusted: they may 
submit maliciously formed write requests to the system 
and may collude with servers or with arbitrarily many 
other clients to try to break the security properties of the 
system. 

Servers in our system are trusted for availability. The 
failure—whether malicious or benign—of any one server 
renders the database state unrecoverable but does not 
compromise the anonymity of the clients. To protect 
against benign failures, server maintainers could imple¬ 
ment a single “logical” Riposte server with a cluster of 
many physical servers running a standard state-machine- 
replication protocol [58,70]. 

For each of the cryptographic instantiations of Riposte, 
there is a threshold parameter t that defines the number of 
malicious servers that the system can tolerate while still 
maintaining its security properties. We make no assump¬ 
tions about the behavior of malicious servers—they can 
misbehave by publishing their secret keys, by colluding 
with coalitions of up to t malicious servers and arbitrar¬ 
ily many clients, or by mounting any other sort of attack 
against the system. 

The threshold t depends on the particular cryptographic 
primitives in use. For our most secure scheme, all but one 
of the servers can collude without compromising client 
privacy (t = |Servers| — 1). For our most efficient scheme, 
no two servers can collude (t = 1). 

2.3 Security Goals 

The Riposte system implements a write-private and 
disruption-resistant database scheme. We describe the 
correctness and security properties for such a scheme 
here. 

Definition 1 (Correctness). The scheme is correct if, when 
all servers execute the protocol faithfully, the plaintext 


state of the database revealed at the end of a protocol run 
is equal to the result of applying each valid client write 
requests to an empty database (i.e., a database of all ze¬ 
ros). 

Since we rely on all servers for availability, correctness 
need only hold when all servers run the protocol correctly. 

To be useful as an anonymous bulletin board, the 
database scheme must be write-private and disruption re¬ 
sistant. We define these security properties here. 

(s,f (-Write Privacy. Intuitively, the system provides 
(s.t)-write-privacy if an adversary’s advantage at guess¬ 
ing which honest client wrote into a particular row of the 
database is negligibly better than random guessing, even 
when the adversary controls all but two clients and up to 
t out of.? servers (where t is a parameter of the scheme). 
We define this property in terms of a privacy game, given 
in full in Appendix A. 

Definition 2 ((,v,f (-Write Privacy). We say that the proto¬ 
col provides (sf (-write privacy if the adversary’s advan¬ 
tage in the security game of Appendix A is negligible in 
the (implicit) security parameter. 

Riposte provides a very robust sort of privacy: the ad¬ 
versary can select the messages that the honest clients will 
send and can send maliciously formed messages that de¬ 
pend on the honest clients’ messages. Even then, the ad¬ 
versary still cannot guess which client uploaded which 
message. 

Disruption resistance. The system is disruption resistant 
if an adversary who controls n clients can write into at 
most n database rows during a single time epoch. A sys¬ 
tem that lacks disruption resistance might be susceptible 
to denial-of-service attacks: a malicious client could cor¬ 
rupt every row in the database with a single write request. 
Even worse, the write privacy of the system might prevent 
the servers from learning which client was the disruptor. 
Preventing such attacks is a major focus of prior anony¬ 
mous messaging schemes [17,42,48,79, 81]. Under our 
threat model, we trust all servers for availability of the 
system (though not for privacy). Thus, our definition of 
disruption resistance concerns itself only with clients at¬ 
tempting to disrupt the system—we do not try to prevent 
servers from corrupting the database state. 

We formally define disruption resistance using the fol¬ 
lowing game, played between a challenger and an adver¬ 
sary. In this game, the challenger plays the role of all of 
the servers and the adversary plays the role of all clients. 

1. The adversary sends n write requests to the chal¬ 
lenger (where n is less than or equal to the number 
of rows in the database). 
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2. The challenger runs the protocol for a single time 
epoch, playing the role of the servers. The challenger 
then combines the servers’ database shares to reveal 
the plaintext output. 

The adversary wins the game if the plaintext output 
contains more than n non-zero rows. 

Definition 3 (Disruption Resistance). We say that the pro¬ 
tocol is disruption resistant if the probability that the ad¬ 
versary wins the game above is negligible in the ( implicit) 
security parameter. 

2.4 Intersection Attacks 

Riposte makes it infeasible for an adversary to determine 
which client posted which message within a particular 
time epoch. If an adversary can observe traffic patterns 
across many epochs, as the set of online clients changes, 
the adversary can make statistical inferences about which 
client is sending which stream of messages [28,55,60]. 
These “intersection” or “statistical disclosure” attacks af¬ 
fect many anonymity systems and defending against them 
is an important, albeit orthogonal, problem [60,80]. Even 
so, intersection attacks typically become more difficult to 
mount as the size of the anonymity set increases, so Ri¬ 
poste’s support for very large anonymity sets makes it less 
vulnerable to these attacks than are many prior systems. 

3 System Architecture 

As described in the prior section, a Riposte deployment 
consists of a small number of servers, who maintain the 
database state, and a large number of clients. To write 
into the database, a client splits its write request using se¬ 
cret sharing techniques and sends a single share to each 
of the servers. Each server updates its database state us¬ 
ing the client’s share. After collecting write requests from 
many clients, the servers combine their shares to reveal 
the plaintexts represented by the write requests. The secu¬ 
rity requirement is that no coalition of t servers can learn 
which client wrote into which row of the database. 

3.1 A First-Attempt Construction: 

Toy Protocol 

As a starting point, we sketch a simple “straw man” 
construction that demonstrates the techniques behind our 
scheme. This first-attempt protocol shares some design 
features with anonymous communication schemes based 
on client/server DC-nets [17,81], 


In the simple scheme, we have two servers, A and B, 
and each server stores an L-bit bitstring, initialized to 
all zeros. We assume for now that the servers do not 
collude —i.e., that one of the two servers is honest. The 
bitstrings represent shares of the database state and each 
“row” of the database is a single bit. 

Consider a client who wants to write a “1” into row t 
of the database. To do so, the client generates a random 
L-bit bitstring r. The client sends r to server A and r ('[) eg 
to server B. where eg is an L-bit vector of zeros with a one 
at index l and ® denotes bitwise XOR. Upon receiving 
the write request from the client, each server XORs the 
received string into its share of the database. 

After processing n write requests, the database state at 
server A will be: 

dA = r\ © • • • ® r n 

and the database at server B will be: 

dB = (eg 1 © ■ ■ ■ © eg n ) © (n © ■ ■ ■ © r n ) 

= (eg l © ■ • • © eg n ) © dA 

At the end of the time epoch, the servers can reveal 
the plaintext database by combining their local states 
dA and dg- 

The construction generalizes to fields larger than F 2 . 
For example, each “row” of the database could be a k-bit 
bitstring instead of a single bit. To prevent impersonation, 
network-tampering, and replay attacks, we use authenti¬ 
cated and encrypted channels with per-message nonces 
bound to the time epoch identifier. 

This protocol satisfies the write-privacy property as 
long as the two servers do not collude (assuming that the 
clients and servers deploy the replay attack defenses men¬ 
tioned above). Indeed, server A can information theoreti¬ 
cally simulate its view of a run of the protocol given only 
eg. © • • • © eg n as input. A similar argument shows that the 
protocol is write-private with respect to server B as well. 

This first-attempt protocol has two major limitations. 
The first limitation is that it is not bandwidth-efficient. If 
millions of clients want to use the system in each time 
epoch, then the database must be at least millions of bits 
in length. To flip a single bit in the database then, each 
client must send millions of bits to each database, in the 
form of a write request. 

The second limitation is that it is not disruption resis¬ 
tant: a malicious client can corrupt the entire database 
with a single malformed request. To do so, the malicious 
client picks random L-bit bitstrings r and r', sends r to 
server A, and sends r 1 (instead of r®ef) to server B. Thus, 
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a single malicious client can efficiently and anonymously 
deny service to all honest clients. 

Improving bandwidth efficiency and adding disruption 
resistance are the two core contributions of this work, and 
we return to them in Sections 4 and 5. 


3.2 Collisions 


Putting aside the issues of bandwidth efficiency and dis¬ 
ruption resistance for the moment, we now discuss the is¬ 
sue of colliding writes to the shared database. If clients 
write into random locations in the database, there is some 
chance that one client’s write request will overwrite a pre¬ 
vious client’s message. If client A writes message niA into 
location £, client B might later write message mg into the 
same location £. In this case, row £ will contain m,\ © mu, 
and the contents of row £ will be unrecoverable. 

To address this issue, we set the size of the database ta¬ 
ble to be large enough to accommodate the expected num¬ 
ber of write requests for a given “success rate.” For exam¬ 
ple, the servers can choose a table size that is large enough 
to accommodate 2 10 write requests such that 95% of write 
requests will not be involved in a collision (in expecta¬ 
tion). Under these parameters, 5% of the write requests 
will fail and those clients will have to resubmit their write 
requests in a future time epoch. 

We can determine the appropriate table size by solving 
a simple “balls and bins” problem. If we throw m balls 
independently and uniformly at random into n bins, how 
many bins contain exactly one ball? Here, the m balls 
represent the write requests and the n bins represent the 
rows of the database. 

Let Bij be the probability that ball i falls into bin j. For 
all i and j, Pr[B (/ ] = \/n. Let o\ 1 * be the event that exactly 
one ball falls into bin i. Then 


Pr 




m— 1 


Expanding using the binomial theorem and ignoring low 
order terms we obtain 


Pr 


O 


(i)' 



where the approximation ignores terms of order ( tti/n ) 4 
and o( 1 / n). Then n ■ Pr[oj 11 is the expected number of 
bins with exactly one ball which is the expected number 
of messages successfully received. Dividing this quantity 
by m gives the expected success rate so that: 


So, if we want an expected success rate of 95% then we 
need n « 19.5m. For example, with m = 2 10 writers, we 
would use a table of size n ss 20,000. 

Handling collisions. We can shrink the table size n by 
coding the writes so that we can recover from collisions. 
We show how to handle two-way collisions. That is, 
when at most two clients write to the same location in the 
database. Let us assume that the messages being written 
to the database are elements in some field F of odd char¬ 
acteristic (say F = where p = 2 64 — 59). We replace 
the XOR operation used in the basic scheme by addition 
in F. 

To recover from a two-way collision we will need to 
double the size of each cell in the database, but the overall 
number of cells n will shrink by more than a factor of two. 

When a client A wants to write the message m a € F 
to location £ in the database the client will actually write 
the pair (m.A,m A ) G F 2 into that location. Clearly if no 
collision occurs at location £ then recovering niA at the end 
of the epoch is trivial: simply drop the second coordinate 
(it is easy to test that no collision occurred because the 
second coordinate is a square of the first). Now, suppose 
a collision occurs with some client B who also added her 
own message (m/j,m 2 ; ) G F 2 to the same location £ (and 
no other client writes to location £). Then at the end of the 
epoch the published values are 

S\=niA+niB (mod p) and S 2 = m 2 +mj s (modp) 

From these values it is quite easy to recover both niA and 
ms by observing that 

2 S 2 -S\ = ( m A - m B ) 2 (mod p) 

from which we obtain hia — ms by taking a square root 
modulo p (it does not matter which of the two square roots 
we use—they both lead to the same result). Since .S'] = 
niA + nis is also given it is now easy to recover both in A 
and ms- 

Now that we can recover from two-way collisions we 

(2) 

can shrink the number of cells n in the table. Let O ■ be 
the event that exactly two balls fell into bin i. Then the 
expected number of received messages is 

«Pr[0| 1) ]+2«Pr[0f ) ] (1) 

where Pr[oj 2) ] = (“) (l - ±) m ~ 2 . As before, dividing 
the expected number of received messages (1) by m, ex¬ 
panding using the binomial theorem, and ignoring low or¬ 
der terms gives the expected success rate as: 


E[SuccessRate] = — Prfoj 1 ^ 
m 


m 1 




E[SuccessRate] « 1 


1 / m \ 2 

2 


nn\ z 1 nn\ 

(J + 3 (77) 
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So, if we want an expected success rate of 95% we need 
a table with n « 2.1m cells. This is a far smaller table 
than before, when we could not handle collisions. In that 
case we needed n « 19.5m which results in much bigger 
tables, despite each cell being half as big. Shrinking the 
table reduces the storage and computational burden on the 
servers. 

This two-way collision handling technique generalizes 
to handle k-way collisions for k > 2. To handle k -way 
collisions, we increase the size of each cell by a factor of 
k and have each client i write (nij.mf,... ,mf) £ F A ' to its 
chosen cell. A k-collision gives k equations in k variables 
that can be efficiently solved to recover all k messages, as 
long as the characteristic of F is greater than k [12,19]. 
Using k > 2 further reduces the table size as the desired 
success rate approaches one. 

The collision handling method described in this section 
will also improve performance of our full system, which 
we describe in the next section. 

Adversarial collisions. The analysis above assumes that 
clients behave honestly. Adversarial clients, however, 
need not write into random rows of the database—i.e., all 
m balls might not be thrown independently and uniformly 
at random. A coalition of clients might, for example, try 
to increase the probability of collisions by writing into the 
database using some malicious strategy. 

By symmetry of writes we can assume that all m adver¬ 
sarial clients write to the database before the honest clients 
do. Now a message from an honest client is properly re¬ 
ceived at the end of an epoch if it avoids all the cells filled 
by the malicious clients. We can therefore carry out the 
honest client analysis above assuming the database con¬ 
tain n — m cells instead of n cells. In other words, given a 
bound m on the number of malicious clients we can cal¬ 
culate the required table size n. In practice, if too many 
collisions are detected at the end of an epoch the servers 
can adaptively double the size of the table so that the next 
epoch has fewer collisions. 

3.3 Forward Security 

Even the first-attempt scheme sketched in Section 3.1 pro¬ 
vides forward security in the event that all of the servers’ 
secret keys are compromised [16]. To be precise: an ad¬ 
versary could compromise the state and secret keys of all 
servers after the servers have processed n write requests 
from honest clients, but before the time epoch has ended. 
Even in this case, the adversary will be unable to deter¬ 
mine which of the n clients submitted which of the n plain¬ 
text messages with a non-negligible advantage over ran¬ 
dom guessing. (We assume here that clients and servers 


communicate using encrypted channels which themselves 
have forward secrecy [54].) 

This forward security property means that clients need 
not trust that S — t servers stay honest forever—-just that 
they are honest at the moment when the client submits its 
upload request. Being able to weaken the trust assumption 
about the servers in this way might be valuable in hostile 
environments, in which an adversary could compromise a 
server at any time without warning. 

Mix-nets do not have this property, since servers must 
accumulate a set of onion-encrypted messages before 
shuffling and decrypting them [18]. If an adversary al¬ 
ways controls the first mix server and if it can compro¬ 
mise the rest of the mix servers after accumulating a set 
of ciphertexts, the adversary can de-anonymize all of the 
system’s users. DC-net-based systems that use “blame” 
protocols to retroactively discover disruptors have a simi¬ 
lar weakness [23,81], 

The full Riposte protocol maintains this forward secu¬ 
rity property. 

4 Improving Bandwidth Efficiency 
with Distributed Point Functions 

This section describes how application of private informa¬ 
tion retrieval techniques can improve the bandwidth effi¬ 
ciency of the first-attempt protocol. 

Notation. The symbol F denotes an arbitrary finite field, 
Z/, is the ring of integers modulo L. We use ei £ F L to 
represent a vector that is zero everywhere except at index 
£ £ ’Ll, where it has value “1.” Thus, form £ F, the vector 
m ■ et £ ¥ L is the vector whose value is zero everywhere 
except at index £, where it has value m. For a finite set S, 
the notation x -£-r S indicates that the value of x is sam¬ 
pled independently and uniformly at random from S. The 
element v[i] is the value of a vector v at index i. We index 
vectors starting at zero. 

4.1 Definitions 

The bandwidth inefficiency of the protocol sketched 
above comes from the fact that the client must send an 
/.-bit vector to each server to flip a single bit in the logical 
database. To reduce this O(L) bandwidth overhead, we 
apply techniques inspired by private information retrieval 
protocols [20,21,41]. 

The problem of private information retrieval (PIR) is 
essentially the converse of the problem we are interested 
in here. In PIR, the client must read a bit from a replicated 
database without revealing to the servers the index being 
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read. In our setting, the client must write a bit into a repli¬ 
cated database without revealing to the servers the index 
being written. Ostrovsky and Shoup first made this con¬ 
nection in the context of a “private information storage” 
protocol [71]. 

PIR schemes allow the client to split its query to the 
servers into shares such that (1) a subset of the shares does 
not leak information about the index of interest, and (2) 
the length of the query shares is much less than the length 
of the database. The core building block of many PIR 
schemes, which we adopt for our purposes, is a distributed 
point function. Although Gilboa and Ishai [41] defined 
distributed point functions as a primitive only recently, 
many prior PIR schemes make implicit use the primi¬ 
tive [20,21]. Our definition of a distributed point function 
follows that of Gilboa and Ishai, except that we generalize 
the definition to allow for more than two servers. 

First, we define a (non-distributed) point function. 

Definition 4 (Point Function). Fix a positive integer Land 
a finite field ¥. For all ££Liandm£¥, the point function 
Pg m : Ll —> F is the function such that Pg : m(£) = m and 
p 'tjnif!) = 0 for all £ ^ £’. 

That is, the point function Pg m has the value 0 when 
evaluated at any input not equal to £ and it has the value m 
when evaluated at £. For example, if L = 5 and F = F 2 , the 
point function Pj \ takes on the values (0,0,0,1,0) when 
evaluated on the values (0,1,2,3,4) (note that we index 
vectors from zero). 

An (,y,t [-distributed point function provides a way to 
distribute a point function Pg m amongst ,v servers such that 
no coalition of at most t servers learns anything about £ or 
m given their t shares of the function. 

Definition 5 (Distributed Point Function (DPF)). Fix a 
positive integer L and a finite field F. An (s,/)-distributed 
point function consists of a pair of possibly randomized 
algorithms that implement the following functionalities: 

• Gen(£,/ n) —> (ko, • • • ,k s _i). Given an integer £ £ Ll 
and value m £ F, output a list of s keys. 

• Eval(k, £') —> nf. Given a key k generated using Gen, 
and an index £' £ Ll, return a value in' £ F. 

We define correctness and privacy for a distributed 
point function as follows: 

• Correctness. For a collection of s keys generated 
using Gen(f?,/M), the sum of the outputs of these keys 
(generated using Eval) must equal the point function 
Pg m . More formally, for all £,£' £ Ll and m £ F: 

Pr[(ko,... ,k s _ 1 ) <- Gen (£,m): 

^Eva\(ki,£')=P e , m (£')} = l 


where the probability is taken over the randomness 
of the Gen algorithm. 

• Privacy. Let S be any subset of {0, ...,s— 1} such 
that |S| < t. Then for any 1£Ll and m £ F, let D$g m 
denote the distribution of keys {(k,[ | i £ S} induced 
by (ko,... ,k s _i) <— Gen (£,m). We say that an (s,t)- 
DPF maintains privacy if there exists a p.p.t. algo¬ 
rithm Sim such that the following distributions are 
computationally indistinguishable: 

Ds,e,m ~c Sim(S) 

That is, any subset of at most t keys leaks no informa¬ 
tion about £ or m. (We can also strengthen this defi¬ 
nition to require statistical or perfect indistinguisha- 
bility.) 

Toy Construction. To make this definition concrete, we 
first construct a trivial information-theoretically secure 
(s,s — 1 [-distributed point function with length-L keys. 
As above, we fix a length L and a finite field F. 

• Gen(£,f7j) —> (ko,... ,k s -i). Generate random vec¬ 
tors ko,.. -,k s -2 £ F L . Set k s _ 1 = m ■ eg — £-“q k;. 

• Eval(k,/[ —^ 777 '. Interpret k as a vector in F 1 . Return 
the value of the vector k at index £'. 

The correctness property of this construction follows im¬ 
mediately. Privacy is maintained because the distribution 
of any collection of s — 1 keys is independent of £ and m. 

This toy construction uses length-L keys to distribute a 
point function with domain Ll- Later in this section we 
describe DPF constructions which use much shorter keys. 

4.2 Applying Distributed Point Functions 
for Bandwidth Efficiency 

We can now use DPFs to improve the efficiency of the 
write-private database scheme introduced in Section 3.1. 
We show that the existence of an (s,f)-DPF with keys 
of length |k| (along with standard cryptographic assump¬ 
tions) implies the existence of write-private database 
scheme using s servers that maintains anonymity in the 
presence of t malicious servers, such that write requests 
have length s|k|. Any DPF construction with short keys 
thus immediately implies a bandwidth-efficient write- 
private database scheme. 

The construction is a generalization of the one pre¬ 
sented in Section 3.1. We now assume that there are s 
servers such that no more than t of them collude. Each 
of the s servers maintains a vector in F L as their database 
state, for some fixed finite field F and integer L. Each 
“row” in the database is now an element of F and the 
database has L rows. 


When the client wants to write a message m G F into 
location £ G Zg in the database, the client uses an (s,t)- 
distributed point function to generate a set of s DPF keys: 

(k 0 ,...,k s - 1 ) G- Gen(£,m) 

The client then sends one of the keys to each of the 
servers. Each server i can then expand the key into a 
vector v G F L by computing v(£') = Eval(k;,f") for £' = 
01. The server then adds this vector v into its 
database state, using addition in ¥ L . At the end of the 
time epoch, all servers combine their database states to 
reveal the set of client-submitted messages. 

Correctness. The correctness of this construction follows 
directly from the correctness of the DPF. For each of the 
n write requests submitted by the clients, denote the /-th 
key in the ;-th request as kij, denote the write location as 
£i , and the message being written as When the servers 
combine their databases at the end of the epoch, the con¬ 
tents of the final database at row £ will be: 

dl = E E Eval(*y,*) = E Pt„m, {£) € F 

i= 0 j=0 1=0 

In words: as desired, the combined database contains the 
sum of n point functions—one for each of the write re¬ 
quests. 

Anonymity. The anonymity of this construction follows 
directly from the privacy property of the DPF. Given the 
plaintext database state d (as defined above), any coali¬ 
tion of t servers can simulate its view of the protocol. By 
definition of DPF privacy, there exists a simulator Sim, 
which simulates the distribution of any subset of t DPF 
keys generated using Gen. The coalition of servers can 
use this simulator to simulate each of the n write requests 
it sees during a run of the protocol. Thus, the servers can 
simulate their view of a protocol run and cannot win the 
anonymity game with non-negligible advantage. 

Efficiency. A client in this scheme sends |k| bits to each 
server (where k is a DPF key), so the bandwidth efficiency 
of the scheme depends on the efficiency of the DPF. As 
we will show later in this section, \k\ can be much smaller 
than the length of the database. 

4.3 A Two-Server Scheme Tolerating One 
Malicious Server 

Having established that DPFs with short keys lead to 
bandwidth-efficient write-private database schemes, we 
now present one such DPF construction. This construc¬ 
tion is a simplification of computational PIR scheme of 
Chor and Gilboa [20]. 


This is a (2,1)-DPF with keys of length 0{\/L) op¬ 
erating on a domain of size L. This DPF yields a two- 
server write-private database scheme tolerating one ma¬ 
licious server such that writing into a database of size L 
requires sending 0(y/L) bits to each server. Gilboa and 
Ishai [41] construct a (2,1)-DPF with even shorter keys 
(|k| = polylog(L)), but the construction presented here is 
efficient enough for the database sizes we use in practice. 
Although the DPF construction works over any field, we 
describe it here using the binary field F = F 9 * (the field of 
k-bit bitstrings) to simplify the exposition. 

When Eva I (k, £') is run on every integer £' G {0,..., L — 
1}, its output is a vector of L field elements. The DPF 
key construction conceptually works by representing this 
a vector of L field elements as an x x y matrix, such that 
xy>L. The trick that makes the construction work is that 
the size of the keys needs only to grow with the size of 
the sides of this matrix rather than its area. The DPF keys 
that Gen (£,m) outputs give an efficient way to construct 
two matrices M A and Mg that differ only at one cell £ = 
(£ x ,£ y ) G Z v x Z v (Figure 2). 

Fix a binary finite field F = F 2 *, a DPF domain size 
L, and integers x and y such that xy > L. (Fater in this 
section, we describe how to choose x and y to minimize 
the key size.) The construction requires a pseudo-random 
generator (PRG) G that stretches seeds from some space S 
into length-y vectors of elements of F [51]. So the signa¬ 
ture of the PRG is G : § —> F- v . In practice, an implemen¬ 
tation might use AES-128 in counter mode as the pseudo¬ 
random generator [68]. 

The algorithms comprising the DPF are: 

• Gen (£,m) —> (k A ,kg). Compute integers £ x G Z v and 
£y G Zy such that £ = £ x y + £ y . Sample a random bit- 
vector b A G -r {0,1 }*, a random vector of PRG seeds 
s A G -r §*, and a single random PRG seed s% G-g §. 
Given b A and s A , we define bg and Sg as: 

b,i = 

bg = (b 0 ,...,be x ,---,b x -i) 
s a = (s 0 ,...,s tx ,...,s x -i) 

Sg (.V 0 5 - • • i Sg x ) ■ * * ) S x — 1) 

That is, the vectors b i and bg (similarly s.\ and Sg) 
differ only at index £ x . 

Fet m ■ et be the vector in F- v of all zeros except that it 
has value m at index £ y . Define v G- m■ eg y + G(sp x ) + 

G{s} x ). 

The output DPF keys are: 

k A = (b A ,SA,v) k B = (bg,sg,v) 

• Eval(k,f') —> in'. Interpret k as a tuple (b,s,v). To 
evaluate the PRF at index £', first write £' as an 
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Figure 2: Left: We represent the output of Eva I as an x x y matrix of field elements. Left-center: Construction of the 
v vector used in the DPF keys. Right: using the v, s, and b vectors, Eva I expands each of the two keys into an x x y 
matrix of field elements. These two matrices sum to zero everywhere except at (£ x ,£ y ) = (3,4), where they sum to m. 


(£' x ,£' y ) tuple such that £ x £ Z x , £ y £ Z y , and £' = 
£' x y + £ y . Use the PRG G to stretch the £' x -\h seed 
of s into a length-v vector: g £- G(s[/ ; (.]). Return 
m'^(g[£' y ] + b[£' x W'y})- 

Figure 2 graphically depicts how Eva I stretches the keys 
into a table of x x y field elements. 

Correctness. We prove correctness of the scheme in Ap¬ 
pendix B. 

Privacy. The privacy property requires that there exists 
an efficient simulator that, on input “A” or “B,” outputs 
samples from a distribution that is computationally indis¬ 
tinguishable from the distribution of DPF keys k A or kg- 

The simulator Sim simulates each component of the 
DPF key as follows: It samples b 4—r {0,1}*, s ■£-r § a , 
and v •*— r F- y . The simulator returns (b,s, v). 

We must now argue that the simulator’s output distri¬ 
bution is computationally indistinguishable from that in¬ 
duced by the distribution of a single output of Gen. Since 
the b and s vectors outputted by Gen are random, the 
simulation is perfect. The v vector outputted by Gen is 
computationally indistinguishable from random, since it 
is padded with the output of the PRG seeded with a seed 
unknown to the holder of the key. An efficient algorithm 
to distinguish the simulated v vector from random can 
then also distinguish the PRG output from random. 

Key Size. A key for this DPF scheme consists of: a vector 
in {0,1}*, a vector in IT, and a vector in F y . Let a be 
the number of bits required to represent an element of S 
and let [5 be the number of bits required to represent an 
element of F. The total length of a key is then: 

|k| = (l + a)x + fiy 

For fixed spaces § and F, we can find the optimal choices 
of x and y to minimize the key length. To do so, we solve: 

min((l + a)x + By) subject to xy>L 

XJ/ 


and conclude that the optimal values of x and y are: 


x = cVL 


and 



where 



The key size is then 0(y/L). 

When using a database table of one million rows in 
length (L = 2 20 ), a row length of 1 KB per row (F = 
F 2 8 i 92 ), and a PRG seed size of 128 bits (using AES-128, 
for example) the keys will be roughly 263 KB in length. 
For these parameters, the keys for the naive construction 
(Section 3.1) would be 1 GB in length. Application of ef¬ 
ficient DPFs thus yields a 4,000 x bandwidth savings in 
this case. 


Computational Efficiency. A second benefit of this 
scheme is that both the Gen and Eva I routines are com¬ 
putationally efficient, since they just require performing 
finite field additions (i.e., XOR for binary fields) and PRG 
operations (i.e., computations of the AES function). The 
construction requires no public-key primitives. 


4.4 An y-Server Scheme Tolerating y — 1 
Malicious Servers 

The (2,1)-DPF scheme described above achieved a key 
size of 0{VL) bits using only symmetric-key primitives. 
The limitation of that construction is that it only maintains 
privacy when a single key is compromised. In the context 
of a write-private database scheme, this means that the 
construction can only maintain anonymity in the presence 
of a single malicious server. It would be much better to 
have a write-private database scheme with s servers that 
maintains anonymity in the presence of y — 1 malicious 
servers. To achieve this stronger security notion, we need 
a bandwidth-efficient (s, s — 1)-distributed point function. 

In this section, we construct an (s,s — 1)-DPF where 
each key has size 0(\/L). We do so at the cost of requir¬ 
ing more expensive public-key cryptographic operations. 
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instead of the symmetric-key operations used in the prior 
DPF. While the (2,1)-DPF construction above directly 
follows the work of Chor and Gilboa [20], this (s,s— 1)- 
DPF construction is novel, as far as we know. In recent 
work, Boyle et al. present a (s,s— 1)-DPF construction us¬ 
ing only symmetric-key operations, but this construction 
exhibits a key size exponential in the number of servers 
•s [13]. 

This construction uses a seed-homomorphic pseudo¬ 
random generator [3, 11, 67], to split the key for the 
pseudo-random generator G across a collection of .? DPF 
keys. 

Definition 6 (Seed-Homomorphic PRG). A seed- 
homomorphic PRG is a pseudo-random generator G 
mapping seeds in a group (§, ©) to outputs in a group 
(G, 0 ) with the additional property that for any so, si £§: 

G(s o © si) = G(sq) 0 G(si) 

It is possible to construct a simple seed-homomorphic 
PRG from the decision Diffie-Hellman (DDH) assump¬ 
tion [11,67]. The public parameters for the scheme are list 
of y generators chosen at random from an order -q group 
G, in which the DDH problem is hard [10]. For example, 
if G is an elliptic curve group [61], then the public param¬ 
eters will be y points (Pq,... ,P y -i) £ G- v . The seed space 
is Z £/ and the generator outputs vectors in G v . On input 
s £ Z £/ , the generator outputs (sPq, . . . ,sP y i ). The gen¬ 
erator is seed-homomorphic because, for any so,s\ £ Z £/ , 
and for all i £ {1,... ,y}: s 0 Pj + s \ P t = (s 0 + H )Pi- 

As in the prior DPF construction, we fix a DPF domain 
size L, and integers x and y such that xy > L. The con¬ 
struction requires a seed-homomorphic PRG G : S i—>- G v , 
for some group G of prime order q. 

For consistency with the prior DPF construction, we 
will write the group operation in G using additive nota¬ 
tion. Thus, the group operation applied component-wise 
to vectors u,v £ G- v results in the vector (u + v) £ G- v . 
Since G has order q, qA = 0 for all A £ G. 

The algorithms comprising the (s,s 1)-DPF are: 

• Gen (£,m) —> (ko,...,k s -i). Compute integers £ x £ 
Z x and £ y £ h } , such that £ = £ x y + £ y . Sample ran¬ 
dom integer-valued vectors bo,...,b y _2 •<—r 
random vectors of PRG seeds So,..., s y _ 2 ■<—s §C and 
a single random PRG seed s* ■<—«§. 

Select b s _i £ (Z ? ) A such that E^b^ = ep x (mod q) 
and select s x _i £ § v such that = s* ■ ei x £ G A . 

Define v <— m ■ eg y — G(s*). 

The DPF key for server i £ {0,...,.v — 1} is k ; - = 
(b;,S;,v). 


• Eval(£,£') —> ml. Interpret k as a tuple (b,s,v). To 
evaluate the PRF at index £', first write £' as an 
(£' x .£' y ) tuple such that £' x £ Z x , £' y £ Z v , and £' = 
£' x y + £' y . Use the PRG G to stretch the £ x -th seed 
of s into a length-v vector: g •<— G(s [£(]). Return 

m'<-{ g[t' y ] + W x Wy])- 

We omit correctness and privacy proofs, since they fol¬ 
low exactly the same structure as those used to prove se¬ 
curity of our prior DPF construction. The only difference 
is that correctness here relies on the fact that G is a seed- 
homomorphic PRG, rather than a conventional PRG. As 
in the DPF construction of Section 4.3, the keys here are 
of length 0(VL). 

Computational Efficiency. The main computational cost 
of this DPF construction comes from the use of the 
seed-homomorphic PRG G. Unlike a conventional PRG, 
which can be implemented using AES or another fast 
block cipher in counter mode, known constructions of 
seed-homomorphic PRGs require algebraic groups [67] or 
lattice-based cryptography [3,11]. 

When instantiating the (,v..v - 1)-DPF with the DDH- 
based PRG construction in elliptic curve groups, each 
call to the DPF Eval routine requires an expensive ellip¬ 
tic curve scalar multiplication. Since elliptic curve opera¬ 
tions are, per byte, orders of magnitude slower than AES 
operations, this (s, s — 1 )-DPF will be orders of magnitude 
slower than the (2,1)-DPF. Security against an arbitrary 
number of malicious servers comes at the cost of compu¬ 
tational efficiency, at least for these DPF constructions. 

With DPFs, we can now construct a bandwidth-efficient 
write-private database scheme that tolerates one mali¬ 
cious server (first construction) or s — 1 out of ,v malicious 
servers (second construction). 

5 Preventing Disruptors 

The first-attempt construction of our write-private 
database scheme (Section 3.1) had two limitations: (1) 
client write requests were very large and (2) malicious 
clients could corrupt the database state by sending mal¬ 
formed write requests. We addressed the first of these two 
challenges in Section 4. In this section, we address the 
second challenge. 

A client write request in our protocol just consists of 
a collection of s DPF keys. The client sends one key to 
each of the s servers. The servers must collectively de¬ 
cide whether the collection of.? keys is a valid output of 
the DPF Gen routine, without revealing any information 
about the keys themselves. 
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One way to view the servers’ task here is as a secure 
multi-party computation [45,82]. Each server V s private 
input is its DPF key kj. The output of the protocol is a 
single bit, which determines if the s keys (ko,...,k s -\) 
are a well-formed collection of DPF keys. 

Since we already rely on servers for availability (Sec¬ 
tion 2.2), we need not protect against servers maliciously 
trying to manipulate the output of the multi-party proto¬ 
col. Such manipulation could only result in corrupting the 
database (if a malicious server accepts a write request that 
it should have rejected) or denying service to an honest 
client (if a malicious server rejects a write request that it 
should have accepted). Since both attacks are tantamount 
to denial of service, we need not consider them. 

We do care, in contrast, about protecting client privacy 
against malicious servers. A malicious server participat¬ 
ing in the protocol should not gain any additional infor¬ 
mation about the private inputs of other parties, no matter 
how it deviates from the protocol specification. 

We construct two protocols for checking the validity 
of client write requests. The first protocol is computa¬ 
tionally inexpensive, but requires introducing a third non¬ 
colluding party to the two-server scheme. The second 
protocol requires relatively expensive zero-knowledge 
proofs [34,46,47,74], but it maintains security when all 
but one of s servers is malicious. Both of these protocols 
must satisfy the standard notions of soundness, complete¬ 
ness, and zero-knowledge [15]. 

5.1 Three-Party Protocol 

Our first protocol for detecting malformed write requests 
works with the (2,1)-DPF scheme presented in Sec¬ 
tion 4.3. The protocol uses only hashing and finite field 
additions, so it is computationally inexpensive. The 
downside is that it requires introducing a third audit 
server, which must not collude with either of the other 
two servers. 

We first develop a three-party protocol called 
AlmostEqual that we use as a subroutine to imple¬ 
ment the full write request validation protocol. The 
AlmostEqual protocol takes place between three parties: 
server A, server B, and an audit server. Server A’s private 
input is a vector va £ F" and server B's private input is a 
vector \ B £ F". The audit server has no private input. The 
output of the AlmostEqual protocol is “1” bit if va and 
Vfl differ at exactly one index and is “0” bit otherwise. 
As with classical secure multi-party computations, the 
goal of the protocol is to accurately compute the output 
without leaking any extraneous information about the 
players’ private inputs [33,45,82]. We use AlmostEqual 


in such a way that, whenever the client’s write request is 
properly formed and whenever no two servers collude, 
the output of the protocol will be “1.” Thus, we need only 
prove the protocol secure in the case when the output is 

“ 1 ” 

We denote an instance of the three-party protocol as 
AlmostEqual(vA,VB), where the arguments denote the 
two secret inputs of party A and party B. The protocol 
proceeds as follows: 

1. Servers A and B use a coin-flipping protocol [9] to 
sample n hash functions /zq, - - -, h n ~ \ from a family of 
pairwise independent hash functions % [59] having 
domain F. The servers also agree upon a random 
“shift” value / £ Z„. 

2. Server A computes the values m, /z,-(va [z] ) 
for every index i £ {0 ,...,n — 1} and sends 
(m r , mr+ ],.... m n _ i, mg,..., m i) to the auditor. 

3. Server B repeats Step 2 with \ B . 

4. The audit server returns “1” to servers A and B if and 
only if the vectors it receives from the two servers are 
equal at every index except one. The auditor returns 
“0” otherwise. 

We include proofs of soundness, correctness, and zero- 
knowledge for this construction in Appendix C. 

The keys for the (2,1 )-DPF construction have the form 

^A = (bA,SA,v) k B = { b B ,S B} \). 

In a correctly formed pair of keys, the b and s vectors 
differ at a single index l x , and the v vector is equal to 
\ = m- e ly + G(s a [4]) + G(s b [4]). 

To determine whether a pair of keys is correct, server 
A constructs a test vector tA such that tA [z] = bA [i] ||sa [z] 
for i £ {0 ,...,x— 1}. (where || denotes concatenation). 
Server B constructs a test vector t B in the same way and 
the two servers, along with the auditor run the protocol 
AlmostEqual(tA,tB). If the output of this protocol is “1,” 
then the servers conclude that their b and s vectors differ 
at a single index, though the protocol does not reveal to the 
servers which index this is. Otherwise, the servers reject 
the write request. 

Next, the servers must verify that the v vector is well- 
formed. To do so, the servers compute another pair of test 
vectors: 

X — 1 X— 1 

ua = £G(sa[/]) u B = \+Y J G(s B [i]). 

;=0 1=0 

The servers run AlmostEqual(uA,ug) and accept the write 
request as valid if it returns “1.” 
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We prove security of this construction in Appendix D. 

An important implementation note is that if m = 0— 
that is, if the client writes the string of all zeros into the 
database—then the u vectors will not differ at any index 
and this information is leaked to the auditor. The protocol 
only provides security if the vectors differ at exactly one 
index. To avoid this information leakage, client requests 
must be defined such that m ^ 0 in every write request. To 
achieve this, clients could define some special non-zero 
value to indicate “zero” or could use a padding scheme to 
ensure that zero values occur with negligible probability. 

As a practical matter, the audit server needs to be 
able to match up the portions of write requests coming 
from server A with those coming from server B. Riposte 
achieves this as follows: When the client sends its upload 
request to server A, the client includes a cryptographic 
hash of the request it sent to server B (and vice versa). 
Both servers can use these hashes to derive a common 
nonce for the request. When the servers send audit re¬ 
quests to the audit server, they include the nonce for the 
write request in question. The audit server can use the 
nonce to match every audit request from server A with the 
corresponding request from server B. 

This three-party protocol is very efficient—it only re¬ 
quires 0(VZ) applications of a hash function and O (\/Z) 
communication from the servers to the auditor. The audi¬ 
tor only performs a simple string comparison, so it needs 
minimal computational and storage capabilities. 

5.2 Zero Knowledge Techniques 

Our second technique for detecting disruptors makes use 
of non-interactive zero-knowledge proofs [14,47,74]. 

We apply zero-knowledge techniques to allow clients 
to prove the well-formedness of their write requests. 
This technique works in combination with the (s,s — 1)- 
DPF presented in Section 4.4 and maintains client write- 
privacy when all but one of s servers is dishonest. 

The keys for the (s,s — 1)-DPF scheme are tuples 
(b,-,Sj,v) such that: 

S— 1 s— 1 

£b,=e 4 Y, s ‘ = s * ' e 4 v = m ■ e ty - G(s*) 

;=o i=o 

To prove that its write request was correctly formed, we 
have the client perform zero-knowledge proofs over col¬ 
lections of Pedersen commitments [72]. The public pa¬ 
rameters for the Pedersen commitment scheme consist of 
a group G of prime order q and two generators P and Q of 
G such that no one knows the discrete logarithm logy P. 
A Pedersen commitment to a message m £ Z q with ran¬ 
domness r £ is C(m,r) = ( mP + rQ) £ G (writing the 


group operation additively). Pedersen commitments are 
homomorphic, in that given commitments to ihq and m\, 
it is possible to compute a commitment to ihq + in |: 

C(m 0 ,r 0 )+C(mi,ri) = C(m 0 + wit, r 0 + n) 

Here, we assume that the (s,s — 1)-DPF is instantiated 
with the DDH-based PRG introduced in Section 4.4 and 
that the group G used for the Pedersen commitments is 
the same order-g group used in the PRG construction. 

To execute the proof, the client first generates Peder¬ 
sen commitments to elements of each of the s DPF keys. 
Then each server i can verify that the client computed the 
commitment to the i-th DPF key elements correctly. The 
servers use the homomorphic property of Pedersen com¬ 
mitments to generate commitments to the sum of the ele¬ 
ments of the DPF keys. Finally, the client proves in zero 
knowledge that these sums have the correct values. 

The protocols proceed as follows: 

1. The client generates vectors of Pedersen commit¬ 
ments B, and S,■ committing to each element of b, 
and sThe client sends the B and S vectors to every 
server. 

2. To server i, the client sends the opening of the com¬ 
mitments B, and SEach server i verifies that B, 
and S, are valid commitments to the b, and s, vectors 
in the DPF key. If this check fails at some server i, 
server i notifies the other servers and all servers reject 
the write request. 

3. Using the homomorphic property of the commit¬ 
ments, each server can compute vectors of com¬ 
mitments B sum and S sum to the vectors E-I^b; and 

Y s_, c. 
z 'i=0' 

4. Using a non-interactive zero-knowledge proof, the 
client proves to the servers that B sum and S sum are 
commitments to zero everywhere except at a single 
(secret) index l x , and that B sum [£ x ] is a commitment 
to one. 1 This proof uses standard witness hiding 
techniques for discrete-logarithm-based zero knowl¬ 
edge proofs [14,25]. If the proof is valid, the servers 
continue to check the v vector. 

This first protocol convinces each server that the b and 
s components of the DPF keys are well formed. Next, the 
servers check the v component: 

1 Technically, this is a zero-knowledge proof of knowledge which proves 
that the client knows an opening of the commitments to the stated val¬ 
ues. 
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1. For each server i, the client sums up the seed values 
Si it sent to server i: <7,- = 2T qS,- [j ]. The client then 
generates the output of G(o)t) and blinds it: 

G j = {OjP\ +r\Q, 0,1*2 + riQ- •••)• 

2. The client sends the G values to all servers and the 
client sends the opening of G, to each server i. 

3. Each server verifies that the openings are correct, and 
all servers reject the write request if this check fails 
at any server. 

4. Using the homomorphic property of Pedersen com¬ 
mitments, every server can compute a vector of com¬ 
mitments G sum = (EJZq G,) + v. If v is well formed, 
then the G sum vector contain commitments to zero 
at every index except one (at which it will contain a 
commitment to the client’s message m). 

5. The client uses a non-interactive zero-knowledge 
proof to convince the servers that the vector of com¬ 
mitments G sum contains commitments to zero at all 
indexes except one. If the proof is valid, the servers 
accept the write request. 

We prove in Appendix E that this protocol satisfies the 
standard notions of soundness, completeness, and zero- 
knowledge [15]. 

6 Experimental Evaluation 

To demonstrate that Riposte is a practical platform for 
traffic-analysis-resistant anonymous messaging, we im¬ 
plemented two variants of the system. The first vari¬ 
ant uses the two-server distributed point function (Sec¬ 
tion 4.3) and uses the three-party protocol (Section 5.1) 
to prevent malicious clients from corrupting the database. 
This variant is relatively fast, since it relies primarily on 
symmetric-key primitives, but requires that no two of the 
three servers collude. Our results for the first variant 
include the cost of identifying and excluding malicious 
clients. 

The second variant uses the ,v-server distributed point 
function (Section 4.4). This variant protects against s — 1 
colluding servers, but relies on expensive public-key op¬ 
erations. We have not implemented the zero-knowledge 
proofs necessary to prevent disruptors for the ,v-server pro¬ 
tocol (Section 5.2), so the performance numbers represent 
only an upper bound on the system throughput. 

We wrote the prototype in the Go programming lan¬ 
guage and have published the source code online at 


https://bitbucket.org/henrycg/riposte/. We 
used the DeterLab network testbed for our experi¬ 
ments [62]. All of the experiments used commodity 
servers running Ubuntu 14.04 with four-core AES-NI- 
enabled Intel E3-1260L CPUs and 16 GB of RAM. 

Our experimental network topology used between two 
and ten servers (depending on the protocol variant in use) 
and eight client nodes. In each of these experiments, the 
eight client machines used many threads of execution to 
submit write requests to the servers as quickly as possi¬ 
ble. In all experiments, the server nodes connected to 
a common switch via 100 Mbps links, the clients nodes 
connected to a common switch via 1 Gbps links, and 
the client and server switches connected via a 1 Gbps 
link. The round-trip network latency between each pair 
of nodes was 20 ms. We chose this network topology to 
limit the bandwidth between the servers to that of a fast 
WAN, but to leave client bandwidth unlimited so that the 
small number of client machines could saturate the servers 
with write requests. 

Error bars in the charts indicate the standard deviation 
of the throughput measurements. 

6.1 Three-Server Protocol 

A three-server Riposte cluster consists of two database 
servers and one audit server. The system maintains its 
security properties as long as no two of these three servers 
collude. We have fully implemented the three-server pro¬ 
tocol, including the audit protocol (Section 5.1), so the 
throughput numbers listed here include the cost of detect¬ 
ing and rejecting malicious write requests. 

The prototype used AES-128 in counter mode as the 
pseudo-random generator. Poly 1305 as the keyed hash 
function used in the audit protocol [8], and TLS for link 
encryption. 

Figure 3 shows how many client write requests our Ri¬ 
poste cluster can service per second as the number of 160- 
byte rows in the database table grows. For a database table 
of 64 rows, the system handles 751.5 write requests per 
second. At a table size of 65,536 rows, the system han¬ 
dles 32.8 requests per second. At a table size of 1,048,576 
rows, the system handles 2.86 requests per second. 

We chose the row length of 160 bytes because it was the 
smallest multiple of 32 bytes large enough to to contain a 
140-byte Tweet. Throughput of the system depends only 
the total size of the table (number of rows x row length), 
so larger row lengths might be preferable for other appli¬ 
cations. For example, an anonymous email system using 
Riposte with 4096-byte rows could handle 2.86 requests 
per second at a table size of 40,960 rows. 
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Figure 3: As the database table size grows, the throughput 
of our system approaches the maximum possible given the 
AES throughput of our servers. 
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Figure 4: Use of bandwidth-efficient DPFs gives a 768 x 
speed-up over the naive constructions, in which a client’s 
request is as large as the database. 

An upper bound on the performance of the system is 
the speed of the pseudo-random generator used to stretch 
out the DPF keys to the length of the database table. The 
dashed line in Figure 3 indicates this upper bound (605 
MB/s), as determined using an AES benchmark written in 
Go. That line indicates the maximum possible through¬ 
put we could hope to achieve without aggressive opti¬ 
mization (e.g., writing portions of the code in assembly) 
or more powerful machines. Migrating the performance- 
critical portions of our implementation from Go to C (us¬ 
ing OpenSSL) might increase the throughput by a fac¬ 
tor of as much as 6x, since openssl speed reports AES 
throughput of 3.9 GB/s, compared with the 605 MB/s we 
obtain with Go’s crypto library. At very small table sizes, 
the speed at which the server can set up TLS connections 
with the clients limits the overall throughput to roughly 
900 requests per second. 

Figure 4 demonstrates how the request throughput 
varies as the width of the table changes, while the num¬ 
ber of bytes in the table is held constant at 10 MB. This 
figure demonstrates the performance advantage of using 
a bandwidth-efficient 0(y/L) DPF (Section 4) over the 
naive DPF (Section 3.1). Using a DPF with optimal ta¬ 
ble size yields a throughput of 38.4 requests per sec¬ 


Database table size (# of 160-byte rows) 

Figure 5: The total client and server data transfer scales 
sub-linearly with the size of the database. 

ond. The extreme left and right ends of the figure indi¬ 
cate the performance yielded by the naive construction, in 
which making a write request involves sending a(lxL)- 
dimension vector to each server. At the far right extreme 
of the table, performance drops to 0.05 requests per sec¬ 
ond, so DPFs yield a 768 x speed-up. 

Figure 5 indicates the total number of bytes transferred 
by one of the database servers and by the audit server 
while processing a single client write request. The dashed 
line at the top of the chart indicates the number of bytes 
a client would need to send for a single write request if 
we did not use bandwidth-efficient DPFs (i.e., the dashed 
line indicates the size of the database table). As the fig¬ 
ure demonstrates, the total data transfer in a Riposte clus¬ 
ter scales sub-linearly with the database size. When the 
database table is 2.5 GB in size, the database server trans¬ 
fers only a total of 1.23 MB to process a write request. 


6.2 5-Server Protocol 

In some deployment scenarios, having strong protection 
against server compromise may be more important than 
performance or scalability. In these cases, the s-server 
Riposte protocol provides the same basic functionality as 
the three-server protocol described above, except that it 
maintains privacy even if s — 1 out of s servers collude 
or deviate arbitrarily from the protocol specification. We 
implemented the basic 5-server protocol but have not yet 
implemented the zero-knowledge proofs necessary to pre¬ 
vent malicious clients from corrupting the database state 
(Section 5.2). These performance figures thus represent 
an upper bound on the s-server protocol’s performance. 
Adding the zero-knowledge proofs would require an addi¬ 
tional &(y/L) elliptic curve operations per server in an L- 
row database. The computational cost of the proofs would 
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Figure 6: Throughput of an eight-server Riposte cluster 
using the (8,7)-distributed point function. 

almost certainly be dwarfed by the 0(L) elliptic curve op¬ 
erations required to update the state of the database table. 

The experiments use the DDH-based seed- 
homomorphic pseudo-random generator described 
in Section 4.4 and they use the NIST P-256 elliptic curve 
as the underlying algebraic group. The table row size is 
fixed at 160 bytes. 

Figure 6 demonstrates the performance of an eight- 
server Riposte cluster as the table size increases. At a 
table size of 1,024 rows, the cluster can process one re¬ 
quest every 3.44 seconds. The limiting factor is the rate 
at which the servers can evaluate the DDH-based pseudo¬ 
random generator (PRG), since computing each 32-byte 
block of PRG output requires a costly elliptic curve scalar 
multiplication. The dashed line in the figure indicates the 
maximum throughput obtainable using Go’s implementa¬ 
tion of P-256 on our servers, which in turn dictates the 
maximum cluster throughput. Processing a single request 
with a table size of one million rows would take nearly 
one hour with this construction, compared to 0.3 seconds 
in the AES-based three-server protocol. 

Figure 7 shows how the throughput of the Riposte clus¬ 
ter changes as the number of servers varies. Since the 
workload is heavily CPU-bound, the throughput only de¬ 
creases slightly as the number of servers increases from 
two to ten. 

6.3 Discussion: Whistleblowing and 

Microblogging with Million-User 
Anonymity Sets 

Whistleblowers, political activists, or others discussing 
sensitive or controversial issues might benefit from an 
anonymous microblogging service. A whistleblower, for 
example, might want to anonymously blog about an in¬ 
stance of bureaucratic corruption in her organization. 
The utility of such a system depends on the size of the 
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Figure 7: Throughput of Riposte clusters using two differ¬ 
ent database table sizes as the number of servers varies. 

anonymity set it would provide: if a whistleblower is only 
anonymous amongst a group of ten people, it would be 
easy for the whistleblower’s employer to retaliate against 
everyone in the anonymity set. Mounting this “punish- 
them-all” attack does not require breaking the anonymity 
system itself, since the anonymity set is public. As the 
anonymity set size grows, however, the feasibility of the 
“punish-them-all” attack quickly tends to zero. At an 
anonymity set size of 1,000,000 clients, mounting an 
“punish-them-all” attack would be prohibitively expen¬ 
sive in most situations. 

Riposte can handle such large anonymity sets as long 
as (1) clients are willing to tolerate hours of messaging 
latency, and (2) only a small fraction of clients writes into 
the database in each time epoch. Both of these require¬ 
ments are satisfied in the whistleblowing scenario. First, 
whistleblowers might not care if the system delays their 
posts by a few hours. Second, the vast majority of users 
of a microblogging service (especially in the whistleblow¬ 
ing context) are more likely to read posts than write them. 
To get very large anonymity sets, maintainers of an anony¬ 
mous microblogging service could take advantage of the 
large set of “read-only” users to provide anonymity for the 
relatively small number of “read-write” users. 

The client application for such a microblogging ser¬ 
vice would enable read-write users to generate and sub¬ 
mit Riposte write requests to a Riposte cluster running the 
microblogging service. However, the client application 
would also allow read-only users to submit an “empty” 
write request to the Riposte cluster that would always 
write a random message into the first row of the Riposte 
database. From the perspective of the servers, a read-only 
client would be indistinguishable from a read-write client. 
By leveraging read-only users in this way, we can increase 
the size of the anonymity set without needing to increase 
the size of the database table. 
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To demonstrate that Riposte can support very large 
anonymity set sizes—albeit with high latency—we con¬ 
figured a cluster of Riposte servers with a 65,536-row 
database table and left it running for 32 hours. In that 
period, the system processed a total of 2,895,216 write 
requests at an average rate of 25.19 requests per second. 
(To our knowledge, this is the largest anonymity set ever 
constructed in a system that offers protection against traf¬ 
fic analysis attacks.) Using the techniques in Section 3.2, 
a table of this size could handle 0.3% of users writing at 
a collision rate of under 5%. Thus, to get an anonymity 
set of roughly 1,000,000 users with a three-server Riposte 
cluster and a database table of size 65,536, the time epoch 
must be at least 11 hours long. 

As of 2013, Twitter reported an average throughput of 
5,700 140-byte Tweets per second [56]. That is equiva¬ 
lent roughly 5,000 of our 160-byte messages per second. 
At a table size of one million messages, our Riposte clus¬ 
ter’s end-to-end throughput is 2.86 write requests per sec¬ 
ond (Figure 3). To handle the same volume of Tweets as 
Twitter does with anonymity set sizes on the order of hun¬ 
dreds of thousands of clients, we would need to increase 
the computing power of our cluster by “only” a factor of 
ps 1,750 . 2 Since we are using only three servers now, we 
would need roughly 5,250 servers (split into three non¬ 
colluding data centers) to handle the same volume of traf¬ 
fic as Twitter. Furthermore, since the audit server is just 
doing string comparisons, the system would likely need 
many fewer audit servers than database servers, so the to¬ 
tal number of servers required might be closer to 4,000. 

7 Related Work 

Anonymity systems fall into one of two general cate¬ 
gories: systems that provide low-latency communication 
and those that protect against traffic analysis attacks by a 
global network adversary. 

" Aqua [57], Crowds [75], LAP [52], Shad¬ 
ow Walker [63], Tarzan [35], and Tor [31] belong to 
the first category of systems: they provide an anonymous 
proxy for real-time Web browsing, but they do not protect 
against an adversary who controls the network, many 
of the clients, and some of the nodes on a victim’s path 
through the network. Even providing a formal definition 
of anonymity for low-latency systems is challenging [53] 
and such definitions typically do not capture the need to 
protect against timing attacks. 

- We assume here that scaling the number of machines by a factor of k 
increases our throughput by a factor of k. This assumption is reason¬ 
able given our workload, since the processing of write requests is an 
embarrassingly parallel task. 


Even so, it would be possible to combine Tor (or an¬ 
other low-latency anonymizing proxy) and Riposte to 
build a “best of both” anonymity system: clients would 
submit their write requests to the Riposte servers via 
the Tor network. In this configuration, even if all of 
the Riposte servers colluded, they could not learn which 
user wrote which message without also breaking the 
anonymity of the Tor network. 

David Chaum’s “cascade” mix networks were one of 
the first systems devised with the specific goal of defend¬ 
ing against traffic-analysis attacks [18]. Since then, there 
have been a number of mix-net-style systems proposed, 
many of which explicitly weaken their protections against 
a near omni-present adversary [78] to improve prospects 
for practical usability (i.e., for email traffic) [27]. In con¬ 
trast, Riposte attempts to provide very strong anonymity 
guarantees at the price of usability for interactive applica¬ 
tions. 

E-voting systems (also called “verifiable shuffles”) 
achieve the sort of privacy properties that Riposte offers, 
and some systems even provide stronger voting-specific 
guarantees (receipt-freeness, proportionality, etc.), though 
most e-voting systems cannot provide the forward secu¬ 
rity property that Riposte offers (Section 3.3) [1,22,36, 
49,50,69,73], 

In a typical e-voting system, voters submit their en¬ 
crypted ballots to a few trustees, who collectively shuf¬ 
fle and decrypt them. While it is possible to repurpose 
e-voting systems for anonymous messaging, they typi¬ 
cally require expensive zero-knowledge proofs or are in¬ 
efficient when message sizes are large. Mix-nets that do 
not use zero-knowledge proofs of correctness typically do 
not provide privacy in the face of active attacks by a subset 
of the mix servers. 

For example, the verifiable shuffle protocol of Bayer 
and Groth [5] is one of the most efficient in the litera¬ 
ture. Their shuffle implementation, when used with an 
anonymity set of size N, requires 16 N group exponen¬ 
tiations per server and data transfer 0(N). In addition, 
messages must be small enough to be encoded in single 
group elements (a few hundred bytes at most). In con¬ 
trast, our protocol requires 0(L) AES operations and data 
transfer 0{s/L), where L is the size of the database table. 
When messages are short and when the writer/reader ratio 
is high, the Bayer-Groth mix may be faster than our sys¬ 
tem. In contrast, when messages are long and when the 
writer/reader ratio is low (i.e., L <C 0(N)), our system is 
faster. 

Chaum’s Dining Cryptographers network (DC-net) is 
an information-theoretically secure anonymous broad¬ 
cast channel [17]. A DC-net provides the same strong 
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anonymity properties as Riposte does, but it requires every 
user of a DC-net to participate in every run of the proto¬ 
col. As the number of users grows, this quickly becomes 
impractical. 

The Dissent [81] system introduced the idea of us¬ 
ing partially trusted servers to make DC-nets practical in 
distributed networks. Dissent requires weaker trust as¬ 
sumptions than our three-server protocol does but it re¬ 
quires clients to send O(L) bits to each server per time 
epoch (compared with our 0(y/L)). Also, excluding a sin¬ 
gle disruptor in a 1,000-client deployment takes over an 
hour. In contrast. Riposte can excludes disruptors as fast 
as it processes write requests (tens to hundreds per sec¬ 
ond, depending on the database size). Recent work [24] 
uses zero-knowledge techniques to speed up disruption 
resistance in Dissent (building on ideas of Golle and 
Juels [48]). Unfortunately, these techniques limit the sys¬ 
tem’s end to end-throughput end-to-end throughput to 30 
KB/s, compared with Riposte’s 450+ MB/s. 

Herbivore scales DC-nets by dividing users into many 
small anonymity sets [42]. Riposte creates a single large 
anonymity set, and thus enables every client to be anony¬ 
mous amongst the entire set of honest clients. 

Our DPF constructions make extensive use of prior 
work on private information retrieval (PIR) [20,21,37,41]. 
Recent work demonstrates that it is possible to make theo¬ 
retical PIR fast enough for practical use [29,30,44]. Func¬ 
tion secret sharing [13] generalizes DPFs to allow shar¬ 
ing of more sophisticated functions (rather than just point 
functions). This more powerful primitive may prove use¬ 
ful for PIR and anonymous messaging applications in the 
future. 

Gertner et al. [40] consider symmetric PIR protocols, in 
which the servers prevent dishonest clients from learning 
about more than a single row of the database per query. 
The problem that Gertner et al. consider is, in a way, the 
dual of the problem we address in Section 5, though their 
techniques do not appear to apply directly in our setting. 

Ostrovsky and Shoup first proposed using PIR protocol 
as the basis for writing into a database shared across a set 
of servers [71]. However, Ostrovsky and Shoup consid¬ 
ered only the case of a single honest client, who uses the 
untrusted database servers for private storage. Since many 
mutually distrustful clients use a single Riposte cluster, 
our protocol must also handle malicious clients. 

Pynchon Gate [76] builds a private point-to-point mes¬ 
saging system from mix-nets and PIR. Clients anony¬ 
mously upload messages to email servers using a tradi¬ 
tional mix-net and download messages from the email 
servers using a PIR protocol. Riposte could replace the 
mix-nets used in the Pynchon Gate system: clients could 


anonymously write their messages into the database us¬ 
ing Riposte and could privately read incoming messages 
using PIR. 

8 Conclusion and Open Questions 

We have presented Riposte, a new system for anony¬ 
mous messaging. To the best of our knowledge. Ri¬ 
poste is the first system that simultaneously (1) thwarts 
traffic analysis attacks, (2) prevents malicious clients 
from anonymously disrupting the system, and (3) enables 
million-client anonymity set sizes. We achieve these goals 
through novel application of private information retrieval 
and secure multiparty computation techniques. We have 
demonstrated Riposte’s practicality by implementing it 
and evaluating it with anonymity sets of over two million 
nodes. This work leaves open a number of questions for 
future work, including: 

• Does there exist an ( s,s— 1)-DPF construction for 
s >2 that uses only symmetric-key operations? 

• Are there efficient techniques (i.e., using no public- 
key primitives) for achieving disruption resistance 
without the need for a non-colluding audit server? 

• Are there DPF constructions that enable processing 
write requests in amortized time o(L), for a length-L 
database? 

With the design and implementation of Riposte, we have 
demonstrated that cryptographic techniques can make 
traffic-analysis-resistant anonymous microblogging and 
whistleblowing more practical at Internet scale. 
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A Definition of Write Privacy 

An ( s,t)-w rite-private database scheme consists of the 

following three (possibly randomized) algorithms: 

Write(f,/H) —> (vv(°\... Clients use the Write 

functionality to generate the write request queries 
sent to the s servers. The Write function takes 
as input a message m (from some finite message 
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space) and an integer £ and produces a set of s write 
requests—one per server. 

Update((7,w) —> o'. Servers use the Update functional¬ 
ity to process incoming write requests. The Update 
function takes as input a server’s internal state <7, a 
write request w, and outputs the updated state of the 
server a'. 

Reveal (do,..., di-i) —t D. At the end of the time epoch, 
servers use the Reveal functionality to recover the 
contents of the database. The Reveal function takes 
as input the set of states from each of the s servers 
and produces the plaintext database contents D. 

We define the write-privacy property using the follow¬ 
ing security game, played between the adversary (who 
statically corrupts up to t servers and all but two clients) 
and a challenger. 

1. In the first step, the adversary performs the following 
actions: 

• The adversary selects a subset A s C {0,..., s — 
1} of the servers, such that |„4,,| < t. The set A s 
represents the set of adversarial servers. Let the 
set H s = {0,..., s — 1} \ A s represent the set of 
honest servers. 

• The adversary selects a set of clients H c C 
{0,— 1}, such that \H C \ > 2, representing 
the set of honest clients. The adversary selects 
one message-location pair per honest client: 

M = {(i,rrii,£i) \ i £ H c } 

The adversary sends A s and A4 to the challenger. 

2. In the second step, the challenger responds to the ad¬ 
versary: 

• For each (/,?«,-,£,) £ A4, the challenger gener¬ 
ates a write request: 

, wj 4_1) ) <- Write(4w,-) 

The set of shares of the ith write request 
revealed to the malicious servers is W, = 
{W, Q) }; 6 ^ S . 

In the next steps of the game, the challenger 
will randomly reorder the honest clients’ write 
requests. The challenger should learn nothing 
about which client wrote what, despite all the 
information at its disposal. 

• The challenger then samples a random permu¬ 
tation n over {0,..., |% c | — 1}. The challenger 


sends the following set of write requests to the 
adversary, permuted according to 7f: 

<W*(0),^(1),..., Wtfd'Hcf—I)) 

3. For each client i in {0,1} \ H c , the adversary 

computes a write request (nj°\....wj s (possibly 

according to some malicious strategy) and sends the 
set of these write requests to the challenger. 

4. • For each server j € H s , the challenger com¬ 

putes the server’s final state (7/ by running the 
Update functionality on each of the n client 
write requests in order. Let S = {( j , oj) | j £ 
T-Ls) be the set of states of the honest servers. 

• The challenger samples a bit b '—/< {0,1}. If 
b = 0, the challenger send (S,n) to the adver¬ 
sary. Otherwise, the challenger samples a fresh 
permutation K* on He and sets (S,n*) to the 
adversary. 

5. The adversary makes a guess b’ for the value of b. 

The adversary wins the game if b = /;'. We define 
the adversary’s advantage as Vr[b = b 1 ] — 1/2|. The 
scheme maintains (.v.t)-write privacy if no efficient ad¬ 
versary wins the game with non-negligible advantage (in 
the implicit security parameter). 

B Correctness Proof for (2,1)-DPF 

This appendix proves correctness of the distributed point 
construction of Section 4.3. For the scheme to be correct, 
it must be that, for (kA,ks) ■£- Gen (£,m), for all £' £ 'Ll.'. 

Eva I {kA,t) + Eval(kfl,/) = 

Let (£ x ,£ y ) be the tuple in L x x L y representing location £ 
and let (£' x . £' ) be the tuple representing £'. Let: 

m' A t— Eval(k,4;-0 m 'n ^ Eval {k.B,£')- 

We use a case analysis to show that the left-hand side of 
the equation above equals Pf m for all t'\ 

Case I: £ x ^ l' x . When £ x ^ £' x , the seeds s^ [£' x ] and s b [£ x ] 
are equal, so g A = gs- Similarly [£ x ] = h w [£(.]. The 
output m' A will be g A [£' y ] + [£'^\ v[^]. The output m' B 

will be identical to m' A . Since the field is a binary 
field, adding a value to itself results in the zero ele¬ 
ment, so the sum m' A + m' B will be zero as desired. 
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Case II: i x = t' x and £ y ^ t' y . When l x = i' x , the seeds 
s a [Q and s B [£' x ] are not equal, so g^ g B . Simi¬ 
larly b A [Q b b [Q. When l y ± £' y , \[i' y ] = g A [i' y ] + 
gB [.£']. Assume b A [i'^\ = 0 (an analogous argument 
applies when b A [^] = 1), then: 

V[<] = (>n ■ e ty )[l' y ] +g A [i' y ] + g B[ty]■ 

The sum m' A + m' B will then be: 

m A + m B = g A [i' y \ + g B [C y ] + y[t' y ] = 0. 

Case III: i x = i' x and i y = t' This is the same as Case 
II, except that (in • eo )[£'] = m when l y = i' so the 
sum m' A + m' B = m, as desired. 

C Proofs for the AlmostEqual Proto¬ 
col 

This appendix proves security of the AlmostEqual proto¬ 
col of Section 5.1. 

Soundness. We compute the probability that an honest 
audit server will output “1” when the vectors are not equal 
at exactly one index. First, consider the case when the v 
vectors are equal everywhere. In this case, the test vectors 
that servers A and B send to the audit server will be equal 
everywhere and the audit server will always output “0.” 

Next, consider the case when the v vectors differ at 
k + 1 positions, where k > 0. The soundness error ty is 
equal to the probability that, for every index i! where the 
vectors are unequal (except one), there is a hash collision. 
Since the probability of many hash collisions is bounded 
by the probability of a single hash collision, e* < £i. The 
probability, E\, of a single collision we know from the 
properties of a pairwise-independent hash function fam¬ 
ily, where each member of the family has range R: 

£i = Pr [hi £- R n : hi(\ A [t\) = /z;(v B [;'])] < 

1*1 

The overall soundness error is then at most e < 1/1*1- 
Since \R\ (the output space of the hash function) is expo¬ 
nentially large in the security parameter, this probability 
is negligible. 

Completeness. If the vectors v,t and v/j differ in exactly 
one position, the audit server must output “1” with over¬ 
whelming probability. Since the audit server only out¬ 
puts “1” if exactly one element of the test vectors is equal, 
whenever there is at least one collision in the hash func¬ 
tion, the protocol will return an incorrect result. The prob¬ 
ability of this event happening is negligible, however, as 


long as the length of the vectors is polynomial in the se¬ 
curity parameter. 

Zero Knowledge. The zero-knowledge property need 
only hold when the vectors differ at exactly one index. In 
this case, servers A and B receive a single bit from the 
audit server (a “1”), so the simulation is trivial for the 
database servers. Thus, we only need to prove that the 
zero-knowledge property holds for the audit server. 

Whenever the vectors differ at exactly one position the 
audit server can also simulate its view of the protocol. The 
audit server simulator runs by picking length-/; vectors of 
random elements in the range of the pairwise hash func¬ 
tion family ‘H subject to the constraint that the vectors are 
equal at a random index £ Z„. The simulator outputs the 
two vectors as the vectors received from servers A and B. 

The simulation is valid because H is a pairwise- 
independent hash function family. Let H be a family of 
hash function hi : D —> R Then for all x,y £ D, by defini¬ 
tion of pairwise independence: 

Pr[/i ^R n : h(x) = h(y)\ < ^ 

This property implies that the two vectors sent to the audit 
server leak no information about the v vectors, since an 
honest client’s v vector will be independent of the choice 
of hash function /;, and so every element of the vectors 
sent to the audit servers takes on every value in R with 
equal probability. As in the real protocol, the simulated 
vectors are equal at one random index. 

D Security Proof for Three-Server 
Protocol 

This appendix contains the security proofs for the three- 
server protocol for detecting malicious client requests 
(Section 5.1). 

Completeness. If the pair of keys is well-formed then 
the b ,4 and h/ ; vectors (also the s,i and s/; vectors) are 
equal at every index i ^ £ x and they differ at index ; = l x . 
Even in the negligibly unlikely event that the random seed 
chosen at s A [£ x \ is equal to the random seed chosen at 
Sb[£ x \, the test vectors and t B will still differ because 
b A [l x ] ^ b B [£ x ]. Thus, a correct pair of b and s vectors 
will pass the first AlmostEqual check. 

The second AlmostEqual check is more subtle. If the v 
vector is well formed then, letting £ x be the index where 
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the s vectors differ, we have: 

Ufl = ^52 G(s^[t'])^ +G(sa[4]) + G{sb[P. x ]) +v 

= u A +G(sa[^J) +G(Sb[^]) + v 

= ua+ m ■ e (y 

If v is well-formed, then two test vectors U/\ and ug differ 
only at index l y . 

Soundness. To show soundness, we must bound the 
probability that the audit server will output “1” when the 
servers take a malformed pair of DPF keys as input. If 
the b and s vectors are not equal everywhere except at one 
index, the soundness of the AlmostEqual protocol implies 
that the audit server will return “0” with overwhelming 
probability when invoked the first time. 

Now, given that the s vectors differ at one index, we 
can demonstrate that if the u vectors pass the second 
AlmostEqual check, then v is also well formed. Let l x 
be the index at which the s vectors differ. Write the val¬ 
ues of the s vectors at index l x as s\ and ,v^. Then, by 
construction: 

UA = ( E G(saM) ) +G(4) 

W4 / 

ub= ( E G(s b [i']) ) +G(4)+ v 

W4 / 

The first term of these two expressions are equal (be¬ 
cause the s vectors are equal almost everywhere). Thus, 
to violate the soundness property, an adversary must con¬ 
struct a tuple v) such that the vectors G(s^) and 

(G(ig) + v) differ at exactly one index and such that 
v ^ G(s^) + G(sg) + m ■ e(. This is a contradiction, how¬ 
ever, since if G(s^) and (G(s|j) + v) differ at exactly one 
index, then: 

m-e ey =G{s* A ) + [(G(s* B )+\)] 

for some i y and m, by definition of m ■ eg y . 

Zero Knowledge. The audit server can simulate its view 
of a successful run of the protocol (one in which the in¬ 
put keys are well-formed) by invoking the AlmostEqual 
simulator twice. 


E Security Proof for Zero- 
Knowledge Protocol 

Completeness. Completeness for the first half of the pro¬ 
tocol, which checks the form of the B and S vectors, fol¬ 
lows directly from the construction of those vectors. 

The one slightly subtle step comes in Step 5 of the sec¬ 
ond half of the protocol. For the protocol to be complete, 
it must be that G sum is zero at every index except one. 
This is true because: 

Gsum = (£;=o G/) + v = G(/) + m ■ e ly - G(s*) = m ■ e iy 


Soundness. The soundness of the non-interactive zero- 
knowledge proof in the first half of the protocol guaran¬ 
tees that the B vectors sum to ei x and that the s vectors 
sum to s* • e$ x for some values £ x £ 7 j x and s* £ §. 

We must now argue that the probability that all servers 
accept an invalid write request in the second half of the 
protocol is negligible. The soundness property of the un¬ 
derlying zero-knowledge proof used in Step 5 implies that 
the vector G sum contains commitments to zero at all in¬ 
dices except one. A client who violates the soundness 
property produces a vector v and seed value s* such that 
(X^IIq G ; ) + v = m ■ eo for some values £ y £ Z y and m £ G, 
and that v^m- e.f x — G(s*). This is a contradiction, how¬ 
ever, since (EJITqG;) = G(s*), by the first half of the pro¬ 
tocol, and so: 

(£il0 G ') + V = ,11 ' e iy = G ( S *) + V 

Finally, we conclude that v = m ■ et y — G(s*). 

Zero Knowledge. The servers can simulate every mes¬ 
sage they receive during a run of the protocol. In particu¬ 
lar, they see only Pedersen commitments, which are statis¬ 
tically hiding, and non-interactive zero-knowledge proofs, 
which are simulatable in the random-oracle model [6]. 
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